.TH "makehmmerdb" 1 "@HMMER_DATE@" "HMMER @HMMER_VERSION@" "HMMER Manual"

.SH NAME
makehmmerdb - build a HMMER binary database file from a sequence file


.SH SYNOPSIS
.B makehmmerdb
.I [options]
.I <seqfile>
.I <binaryfile>


.SH DESCRIPTION

.PP
.B makehmmerdb 
is used to create a binary file from a DNA sequence file. This 
binary file may be used as a target database for the DNA search tool
.BR nhmmer . 
Using default settings in 
.IR nhmmer ,
this yields a roughly 10-fold acceleration with small loss of 
sensitivity on benchmarks. (This method has been extensively tested, 
but should still be treated as somewhat experimental.)


.SH OPTIONS

.TP
.B -h
Help; print a brief reminder of command line usage and all available
options.


.\" .SH OPTIONS FOR SPECIFYING THE ALPHABET
.\" 
.\" The alphabet type (amino, DNA, or RNA) is autodetected by default, by
.\" looking at the composition of the
.\" .IR seqfile .
.\" Autodetection is normally quite reliable, but occasionally alphabet
.\" type may be ambiguous and autodetection can fail (for instance, when
.\" the first sequence starts with a run of ambiguous characters). To avoid 
.\" this, or to increase robustness in automated analysis pipelines, you 
.\" may specify the alphabet type of
.\" .I seqfile
.\" with these options.
.\" 
.\" .TP
.\" .B --dna
.\" Specify that all sequences in 
.\" .I msafile
.\" are DNAs.
.\" 
.\" .TP
.\" .B --rna
.\" Specify that all sequences in 
.\" .I msafile
.\" are RNAs.
.\" 
.\" .TP
.\" .B --amino
.\" Specify that all sequences in 
.\" .I msafile
.\" are proteins. Note that currently, a binary database of amino
.\" acid sequence cannot be used as target to hmmsearch of phmmer
.\" (only nhmmer can use the binary format).



.SH OTHER OPTIONS

.TP
.BI --informat " <s>"
Assert that the sequence database file is in format 
.IR <s> . 
Accepted formats include 
.IR fasta , 
.IR embl , 
.IR genbank ,
.IR ddbj , 
.IR uniprot ,
.IR stockholm , 
.IR pfam , 
.IR a2m , 
and 
.IR afa .
The default is to autodetect the format of the file.


.TP 
.BI --bin_length " <n>"
Bin length. The binary file depends on a data structure called the 
FM index, which organizes a permuted copy of the sequence in bins 
of length
.IR <n> .
Longer bin length will lead to smaller files (because data is 
captured about each bin) and possibly slower query time. The 
default is 256. Much more than 512 may lead to notable reduction 
in speed.


.TP 
.BI --sa_freq " <n>"
Suffix array sample rate. The FM index structure also samples from 
the underlying suffix array for the sequence database. More frequent 
sampling (smaller value for 
.IR <n> )
will yield larger file size and faster search (until file size becomes
large enough to cause I/O to be a bottleneck). The default value
is 8. Must be a power of 2.


.TP 
.BI --block_size " <n>"
The input sequence is broken into blocks of size
.I <n>
million letters. An FM index is built for each block, rather than 
building an FM index for the entire sequence database. Default is 
50. Larger blocks do not seem to yield substantial speed increase. 



.SH SEE ALSO 

See 
.B hmmer(1)
for a master man page with a list of all the individual man pages
for programs in the HMMER package.

.PP
For complete documentation, see the user guide that came with your
HMMER distribution (Userguide.pdf); or see the HMMER web page
(@HMMER_URL@).



.SH COPYRIGHT

.nf
@HMMER_COPYRIGHT@
@HMMER_LICENSE@
.fi

For additional information on copyright and licensing, see the file
called COPYRIGHT in your HMMER source distribution, or see the HMMER
web page 
(@HMMER_URL@).


.SH AUTHOR

.nf
Eddy/Rivas Laboratory
Janelia Farm Research Campus
19700 Helix Drive
Ashburn VA 20147 USA
http://eddylab.org
.fi



